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Developing  a  Common  Metric  in  Item  Response  Theory 


Abstract 

A  common  problem  arises  when  independent  estimates  of  item  param¬ 
eters  from  two  separate  data  sets  must  be  expressed  in  the  same  metric. 
This  problem  is  frequently  confronted  in  studies  of  horizontal  and 
vertical  equating  and  in  studies  of  item  bias.  This  paper  discusses 
a  number  of  methods  for  transforming  one  metric  to  another  metric 
and  presents  a  new  method.  Data  are  given  comparing  this  few  method 
with  a  current  method  and  recommendations  are  made. 
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Developing  a  Common  Metric  in  Item  Response  Theory* 


Introduction 


Suppose  that  item  parameters  for  a  given  set  of  items  have  been 
independently  estimated  using  data  obtained  from  two  different  groups 
of  examinees.  These  item  parameter  estimates  will  be  different  because 
the  metric  or  scale  defined  by  each  independent  calibration  of  the  items 
is  different.  Many  applications  of  item  response  theory  (IRT)  require 
that  these  item  parameter  estimates  be  expressed  in  the  same  metric. 

Such  applications  include  vertical  score-scale  equating,  horizontal 
score-scale  equating,  and  item  bias  studies. 

It  is  possible  to  transform  item  parameter  estimates  in  one  metric 
to  another  metric  by  a  number  of  different  methods.  This  paper  will 
discuss  the  nature  of  these  scale  transformations,  survey  a  number  of 
current  transformation  methods,  and  present  a  new  method  and  some 
results  of  its  application. 


The  Nature  of  Scale  Transformations 


Item  response  theory  models  P.(8  ;a.,$.,y.)  »  the  probability 

13  111 

of  a  correct  response  to  item  i  by  a  person  with  abilitv  level  0 

a 

In  typical  models,  P.(0  ;a. ,  f- . , y  . )  is  a  function  of  a. (6  -  t .)  , 

i3iii  la  i 

where  >.  is  the  item  discrimination,  is  the  item  difficulty. 


*Th is  work,  was  supported  in  part  by  contract  N00014-80-C-0402 , 
project  designation  NR150-453  between  the  Office  of  Naval  Research  and 
KducationaL  Testing  Service.  Reproduction  in  whole  or  in  part  is 
permitted  for  any  purpose  of  the  United  States  Government. 
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and  is  the  probability  that  an  individual  of  very  low  ability 

answers  the  item  correctly.  When  P^(6a » 8^, Y^)  is  a  function 
of  a  (0  -  6  )  ,  the  origin  and  unit  of  measurement  of  the  ability 

(and  difficulty)  metric  are  undetermined.  That  is  to  say,  suppose 
6 is  transformed  by  a  linear  transformation,  oroducing  6*  .  Suppose 
the  same  linear  transformation  is  applied  to  8^  to  produce  6*  . 
Finally,  is  divided  by  the  multiplicative  constant  of  the  linear 

transformation  to  produce  a*  .  These  transformations  will  not  change 
the  probability  of  a  correct  response:  P.  (6*;o.*,P*,y  .)  =  P.(0  ;a.,8.,Y.) 

1  3  1  1  1  13  111 

Notice  that  no  transformation  is  necessary  for  the  Y  ^  because  Y  ^  is 
on  the  probability  metric. 

If  an  item  is  calibrated,  i.e.,  its  parameters  are  estimated,  as 
part  of  one  test,  and  then  calibrated  as  part  of  a  second  test  given  to 
a  different  group,  the  actual  values  of  the  estimates  of  the  parameters 
will  differ  because  the  scales  established  by  the  two  calibrations  dif¬ 
fer.  However,  the  relationship  between  these  two  scales  will  be  linear 
since  they  differ  only  in  origin  and  unit  of  measurement. 

If  b^  is  the  estimate  of  item  difficulty  from  the  calibration 
of  item  i  in  test  1,  and  b^  is  the  estimate  of  the  same  item  dif¬ 
ficulty  from  the  calibration  of  test  2,  b*2  ,  the  value  of  b^2 

transformed  to  the  scale  of  test  1,  is 


b*2  =  Ab12 


+  B 


(1) 
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where  A  and  B  are  constants  of  the  linear  transformation  of  scale. 

If  estimated  item  difficulties  are  transformed  by  a  linear  transformation, 
estimated  abilities  must  be  transformed  by  the  same  transformation,  thus 


9*  =  A0  0  +  B  .  (2) 

a2  a2 

If  estimated  item  difficulty  and  ability  are  transformed  by  these  linear 
expressions,  then  estimated  item  discrimination  is  transformed  by 


a*  =  a . „/A 

i2  i2 


(3) 


These  transformations  do  not  change  a.o(0  „  -  b.„)  ,  consequentlv 

i2  a2  i2 

Pi(Sa2;ai2,bi2,Ci2)  =  Pf a2 ’ ai2’ bi2’ ci2)  * 

The  problem  of  transforming  the  scales  reduces  to  the  problem  of 

finding  the  appropriate  A  and  B  of  the  linear  transformation.  If  we 

were  dealing  with  true  values  of  the  parameters  on  their  respective  scales, 

it  would  be  simple  to  find  the  correct  values  of  A  and  B  ;  we  could  plot 

the  values  of  two  or  more  item  difficulties  and  determine  the  line  passing 

through  them.  But,  we  do  not  have  true  values;  we  have  only  estimates  of 

them,  and  these  estimates  contain  error.  The  estimated  item  difficulties  will 

not  fall  into  a  straight  line,  but  be  scattered  around  some  straight  line. 


All  methods  of  transforming  scales  attempt  to  estimate  the  parameters  of 


this  line  by  various  techniques,  and  are  applicable  to  any  IRT  model  where 

P  .  (0  ;  i isaf  unc  t  ion  of  ,  .  (°  -  •  . )  . 
i  a  i  i  i  l  a  i 


Current  Me  the d s 

Superficially,  the  problem  of  finding  the  linear  relationship 
between  two  sets  of  numbers  might  seem  to  call  for  simple  regression 
techniques.  The  estimated  item  difficulties  (or  abilities)  from  one 
calibration  might  be  used  as  the  independent  variable,  and  those  obtained 
from  the  second  calibration  as  the  dependent  variable.  This  approach 
would  be  incorrect.  A  regression  approach  assumes  the  independent  vari¬ 
able  is  measured  without  error;  we  know  this  is  not  the  case.  But  more 
important,  a  regression  procedure  is  not  symmetric  with  respect  to  its 
treatment  of  the  two  estimates  of  item  difficulties.  Since  we  have  no 
reason  for  emphasizing  or  favoring  one  estimate  of  item  difficulty  over 
another  estimate  of  the  same  item  difficulty,  wc  require  a  symmetric 
procedure . 

A  class  of  symmetric  methods  uses  the  first  two  moments  of  the  dis¬ 
tributions  of  estimated  item  dif f icul ties.  Ihese  methods  find  the  param¬ 
eters  of  the  linear  transformation,  A  and  15  ,  such  that  the  mean  and 
standard  deviation  of  the  transformed  distribution  of  estimated  item  dif¬ 
ficulties  from  the  second  calibration  are  equal  to  the  mean  and  standard 
deviation  of  the  estimated  item  difficulties  from  the  first  calibration. 

A  simple  application  of  this  method  is  found  in  Marco  (1977)  and  in 
Cook,  Eignor,  and  Hutten  (1979).  Poorly  estimated  item  difficulties  may 
have  a  serious  impact  of  the  computation  of  sample  moments,  however,  pro¬ 
ducing  a  linear  transformation  that  cannot  be  useful.  Cook  et  al.  (1979) 


attempt  to  solve  this  by  restricting  the  range  of  the  difficulties  used 
in  computing  moments. 

Bejar  and  Wingerskv  (1981)  use  a  more  elaborate  approach.  Robust 
methods  that  Rive  smaller  weights  to  outlying  points  are  used  to  esti¬ 
mate  the  moments.  Linn,  Levine,  Hastings,  &  Wardrop  tl980)  attempt  to 
reduce  the  influence  of  outliers  by  using  weighted  moments  where  the 
weights  are  inversely  proportional  to  the  estimated  standard  error  ol 
the  estimates  of  the  item  difficulties. 

The  Bejar  and  Wingerskv  procedure  treats  all  outliers  in  the  same 
fashion,  regardless  of  their  standard  error.  The  Linn  et  al.  procedure 
L feats  all  points  with  the  same  standard  error  in  the  same  fashion, 
regardless  of  their  outlier  status.  A  procedure  was  developed  by  Lord 
and  Stocking  which  attempts  to  overcome  these  potential  problems.  This 
procedure  begins  with  a  weighted  estimate  of  the  transformation  exactly 
as  in  Linn  et  al.  A  robust  procedure  is  then  used  to  give  small  weights  to 
those  values  whose  perpendicular  distance  from  this  initial  line  is  large, 
and  a  new  line  is  estimated.  The  robust  weighting  is  repeated  until  change 
in  the  perpendicular  distances  become  small.  Details  ot  this  method  are 
presented  in  the  Appendix.  Some  results  of  this  method  will  be  described 
in  subsequent  sections  of  this  paper. 

A  drawback  of  all  of  these  "mean  and  sigma"  transformation  procedures 
is  that  they  are  typically  applied  only  to  the  estimated  item  difficulties. 
That  is,  the  A  and  B  of  the  linear  transformation  of  scale  are 
estimated  using  only  the  b.  ,  and  then  applied  to  transform  the 
and  the  a.  .  WhiLe  this  is  theoretically  correct,  better  methods  mav 


exist  which  use  more  of  the  information  available  from  the  calibrations. 


A  class  of  methods,  called  "characteristic  curve  methods"  in  this 
paper,  uses  more  information  from  calibrations.  Each  calibration  of  an 
item  yields  an  estimated  item  response  function  or  item  characteristic  curve 
P.(  '  )  =  P . ( ■■  ;a  . , b . , c  . )  .  If  estimates  were  error  free,  the  proper  choice 
of  A  and  B  for  the  linear  transformation  would  cause  these  two  curves  to 
coincide.  Haebara  (1980)  averages  the  squared  difference  between  the  indi¬ 
vidual  item  response  functions  over  a  suitable  distribution  of  ,  sums 
over  the  items  common  to  the  two  calibrations,  and  chooses  A  and  B  to 
minimize  this  sum.  Divgi  (1980)  chooses  the  A  and  B  of  the  linear  trans¬ 
formation  to  minimize  the  maximum  difference  between  the  sum  of  item 
response  functions  for  the  first  calibration  and  the  sum  of  the  item 
response  functions  for  the  second  calibration. 

The  New  Method 

This  method  falls  into  the  class  of  characteristic  curve  methods. 

An  examinee,  a  ,  with  abilitv  0  has  a  true  score  •,  defined 

a  a 

hv 


:  =  5(0  )  5  T.  P  (9  )  ) 

«  a  .  ,  x  a  i  i  i 

i=l 


(4) 


where  n  is  the  number  of  items  in  the  test.  The  correct  linear  trans¬ 


formation  of  scales  from  two  different  calibrations  of  the  same  test  would 
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produce  the  same  true  scores  for  examinee  a  if  the  were 

known.  If  is  the  estimated  true  score  obtained  from  the  second  calibra- 

a 

tion  of  the  test  after  it  has  been  transformed  to  the  scale  of  the  first,  then 


r.  * 

a 


*(”a) 


i=l 


P .  ( '•  ;  a*  ,  b* ,  c  . ) 
i  a  1  i  i 


(5) 


For  an  examinee,  the  difference  ( -  ■*)  should  be  small.  In 

a  a 

practice,  we  want  to  choose  A  and  B  such  that  for  a  suitable 
group  of  examinees,  the  average  squared  difference  between  true  score 
estimates  is  as  small  as  possible.  The  function  to  be  minimized 
is 


F  =  £  :  (T  -  c*)  ,  (0) 

N  .  d  a 
a=l 

where  N  is  the  number  of  examinees  in  the  arbitrary  group. 

This  function  F  considered  as  a  function  of  A  and  B  will  be 
minimized  when 


3F 
3  A 


? 

N  i 

a=l 


a. 


-  c*> 

a 


3A 


=  0 


and 


3F 
3  B 


^2 

N 


N 

Z 

a=l 


3^4 

(C  -  £*) 
a  a  o  d 


0 


(7) 


(8) 
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The  functional  form  of  the  partial  derivatives  of  the  item  response 
function  depends  on  the  mathematical  model  chosen.  Formulas 
for  the  partial  derivatives  for  the  three-parameter  logistic  item 
response  function  are  given  in  Lord  (1980,  Chapter  4). 
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Once  the  functional  form  for  the  item  response  function  is  chosen, 
its  derivatives  are  substituted  into  equations  (10)  and  (12).  These 
new  expressions  are  then  substituted  into  equations  (7)  and  (8)  to 
find  the  location  of  the  minimum  of  F  in  equation  (6). 

In  the  applications  described  in  the  following  section,  the 
arbitrary  group  of  examinees  over  which  the  function  was  minimized 
was  chosen  to  be  a  spaced  sample  of  about  200  examinees  from  the  first 
calibration  of  a  test.  The  parameters  A  and  B  of  the  linear  trans¬ 
formation  were  found  by  minimizing  F  using  the  multivariate  search 
technique  by  Davidon  (1959)  and  Fletcher  and  Powell  (1963). 

Results 


The  Data  and  Analyses 

Data  from  about  2000  examinees  from  each  of  12  separate  administra¬ 
tions  of  the  Scholastic  Aptitude  Test  (SAT)  were  selected  for  this  study. 

The  SAT  consists  of  six,  30-minute  sections:  two  operational  verbal 
sections,  two  operational  mathematical  sections,  one  Test  of  Standard 
Written  English  (TSWE)  and  one  variable  section  containing  equating  or 
pretest  items.  The  two  verbal  sections  contain  40  and  45  items 
respectively;  mathematical  sections  are  25  and  35  items  respectively. 

Verbal  equating  or  pretest  sections  are  40  items  long;  corresponding 
mathematical  sections  are  25  items  long.  TSWE  data  were  not  used  in 
this  study. 

Each  box  in  Exhibit  1  represents  the  operational  sections,  either 
verbal  or  mathematical,  of  a  particular  form  of  the  SAT  (upper  case  letters 
and  numbers)  and  the  equating  section  administered  with  that  test  form 
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(lower  case  letters).  Each  box  contains  items  that  are  the  same  as  items 
shown  in  boxes  above  and  below  it.  For  example,  the  second  box  in  the 
verbal  series  contains  items  designated  "X2fe."  The  "fe"  items  overlap 
with  those  contained  in  the  box  labeled  "V4fe";  the  "X2"  items  overlap 
with  those  contained  in  the  box  labeled  "X2fm."  The  last  box  in  each 
of  the  verbal  and  mathematical  series  contains  items,  that  overlap  with 
the  items  in  the  first  box,  thus  forming  a  closed  chain. 

Each  box  represents  a  separate  calibration  run  using  the  computer 
program  LOGIST  (Wingerskv,  in  press;  Wingersky,  Barton,  Lord,  1982).  For 
both  the  verbal  chain  and  the  mathematical  chain,  the  scale  established  by 
the  calibration  of  the  items  in  the  first  box  in  the  chain  was  arbitrarily 
chosen  as  the  "base  scale"  for  that  chain.  The  estimates  of  item  param¬ 
eters  for  the  overlapping  items  were  then  used  to  transform  the  scales 
established  by  the  separate  calibrations  onto  the  appropriate  base 
scale.  For  the  verbal  chain,  for  example,  X2fe  was  transformed  to  the 
scale  of  V4fe  using  the  item  parameter  estimates  for  the  fe  items  that 
appear  in  both  calibrations.  Then  X2fm  was  transformed  to  the  scale  of 
the  transformed  X2fe  items,  using  the  item  parameter  estimates  for  the 
X2  common  items.  This,  of  course,  places  the  X2fm  items  on  the  V4fe  scale. 
The  next  set  of  items,  Y3fm,  was  transformed  to  the  scale  of  the  trans¬ 
formed  X2fm  items  and  so  forth,  until  all  items  were  placed  on  the  scale 
of  V4fe. 

This  sequential  transformation  process  was  performed  in  two  ways: 

(1)  the  robust  mean  and  sigma  Lord  and  Stocking  method  described  in 
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Verbal  Chain 


Mathematical  Chain 

j~V4  ££ 


ie  X2 


X2  fa 


b. 


fa  Y3j 


Y3  fw 


fw  113 

r  -i 

fx  33 1 

- 1 

J 


!  f k  Y2  J 

: - 1 


7.5  c 


let  V4 

i _ 


l: 


Exhibit  1:  Verbal  and  Mathematical  Chains.  Each  box  contains  verbal  or 
mathematical  sections  (capital  letters  and  numbers)  and  an  equating 
section  (small  letters). 
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the  Appendix  and  (2)  the  new  characteristic  curve  method  described 
previously.  This  allows  the  comparison  of  the  end  results  of  the 
chaining  process  between  the  two  transformation  methods,  but  does  not 
allow  the  comparison  of  the  results  of  individual  "links"  in  the  chain. 

To  compare  individual  links  in  the  chain,  each  link  in  the  chain 
from  the  robust  mean  and  sigma  method  was  repeated  exactly  with  the 
characteristic  curve  method.  For  example,  in  the  verbal  chain,  X2fm 
was  transformed  to  the  scale  of  the  (mean  and  sigma)  transformed  X2fe 
by  the  mean  and  sigma  method  as  part  of  the  sequential  chaining  using 
this  method.  This  link  was  repeated  exactly  by  using  the  character istic 
curve  method  to  transform  X2fm  to  the  scale  of  the  (mean  and  sigma) 
transformed  X2fe.  In  contrast  to  the  chain  of  characteristic  curve 
transformations,  this  series  of  characteristic  curve  transformations 
does  not  form  a  chain. 

Results  of  Transformations  for  Verbal  Items — Individual  Links 

A  typical  comparison  of  individual  links  is  shown  in  Figures  1 
and  2.  In  Figure  1,  the  horizontal  axis  is  the  (robust  mean  and  sigma) 
transformed  item  difficulties  for  operational  section  X2  from  the  X2fe 
calibration.  The  vertical  axis  is  the  scale  of  the  item  difficulties 
for  operational  section  X2  from  the  X2fm  calibration.  In  Figure  2,  the 
horizontal  axis  is  the  scale  of  the  (robust  mean  and  sigma)  transformed 
item  discriminations  from  X2  of  X2fe.  The  vertical  axis  is  the  scale 
of  the  item  discriminations  of  X2  from  X2fm.  The  solid  line  through  the 


WEIGHTED  MEAN  AND  SIGMA  TRANSFORMATION 
CHARACTERISTIC  CURVE  TRANSFORMATION 


-13- 


Figure  1.  The  two  transformations  for  item  difficulties 
compared  for  a  typical  verbal  link. 
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the  points  in  each  figure  is  the  linear  transformation  estimated  by 
the  robust  mean  and  sigma  method.  The  dashed  line  is  the  linear 
transformation  estimated  by  the  new  characteristic  curve  method.  The 
linear  transformations  do  not  differ  much. 

The  largest  difference  found  between  the  two  methods  for  the  verbal 
chain  is  shown  in  Figures  3  and  4.  Figure  3  shows  the  presence  of  six 
points  which  could  be  considered  outliers.  The  robust  mean  and  sigma 
method  explicitly  tries  to  deal  with  these  points,  first  by  giving  them 
low  weights  if  the  estimated  standard  errors  are  large,  and  then  by 
giving  them  low  weights  if  the  perpendicular  distance  to  the  initial  line 
is  large.  These  points  all  ended  up  with  weights  which  were  very  small 
or  zero,  thus  some  available  information  may  have  been  discarded.  The 
characteristic  curve  method  does  not  discard  any  information.  No  other 
verbal  link  contained  as  many  outliers  as  this  one.  It  is  possible  that 
the  difference  between  the  two  methods  is  due  to  their  differential 
discarding  of  information. 

On  the  whole,  the  direct  comparison  of  individual  links  shows  little 
difference  between  the  two  transformation  methods  for  verbal  data. 

Results  of  Transformations  for  Mathematical  Items — Individual  Links 

Most  of  the  comparisons  of  the  two  transformation  methods  using 
mathematical  data  show  little  difference  between  the  two  methods. 

There  are  exceptions,  one  of  which  is  shown  in  Figures  5  and  6. 

Inspection  of  Figure  5  shows  the  characteristic  curve  transformation 


is  clearly  a  better  fit  to  the  data  than  the  robust  mean  and  sigma 


WEIGHTED  MEAN  AND  SIGMA  TWANSFORMAT I ON 
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SAT  VERBAL  ITEM  DISCRIMINATION 


Figure  4.  The  two  transformations  for  item  discriminations 
compared  for  the  worst  verbal  link. 


transformation.  This  difference  is  more  visible  in  Figure  6  where 


the  robust  mean  and  sigma  transformation  of  the  item  discriminations 
produces  unsatisfactory  results.  The  line  does  not  bisect  the  point 
cloud;  there  are  only  18  out  of  60  points  below  the  line.  The 
characteristic  curve  transformation  was  better;  31  out  of  60  points 
aie  below  the  line. 

There  were  two  links  which  produced  comparisons  of  this  kind.  That 
is,  the  characteristic  curve  transformation  worked  better  than  the  mean 
and  sigma  transformation  in  both  the  fit  to  the  item  difficulties  and 
the  fit  to  the  item  discriminations.  There  were  no  links  in  which  the 
mean  and  sigma  transformation  fit  both  the  item  difficulties  and  item 
discriminations  better. 

Chain  Results 

The  cumulative  results  of  chains  of  transformations  may  be  evaluated 
by  transforming  the  last  (transformed)  set  of  items  in  the  chain 
directly  to  the  base  scale  defined  by  the  first  set  of  items.  Since 
the  first  and  last  sets  of  items  are  identical,  this  transformation 
should  be  an  identity  transformation.  Figure  7  shows  this  comparison 
of  each  transformation  method  for  the  SAT  verbal  chain,  and  the  identity 
transformation.  The  difficulties  for  items  common  to  the  first  and 
last  set  of  items  are  plotted  on  the  horizontal  axis.  Figure  8  displays 
the  same  information  for  the  SAT  mathematical  chain. 


ITEM  DIFFICULTIES  FROM  V4FE 
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Figure  7. 
chain. 


The  final  transformations  for  the  SAT  verbal 


SAT  MATH  COMPARISON  OF  CHAIN  RESULTS 


The  robust  mean  and  sigma  method  gives  slightly  better  results 
than  the  characteristic  curve  method  for  verbal  data.  For  mathematical 
data,  the  characteristic  curve  method  worked  better  than  the  robust 
mean  and  sigma  method. 


Conclusions 

In  situations  where  the  robust  mean  and  sigma  transformation 
method  worked  well,  as  in  the  verbal  data  and  most  of  the  mathematical 
data,  the  character ist ic  curve  method  also  worked  well.  However,  the 
robust  mean  and  sigma  method  sometimes  produced  unsatisfactory  results. 

In  these  instances,  the  characteristic  curve  method  worked  much  better. 

In  particular,  the  characteristic  curve  method  produced  a  much  better 
transformation  for  the  item  discriminations  (see  Figure  6).  If  one  is 
choosing  a  transformation  method,  the  characteristic  curve  method, 
which  uses  more  of  the  information  available  from  each  of  the  calibrations, 


would  be  recommended  by  the  authors 


Appendix 


Transforming  Logistic  Scales  Using  a  Robust  Iterative  Weighted 

Mean  and  Sigma  Method 

This  transformation  method  uses  a  function  of  the  estimated  standard 
errors  of  the  estimated  item  difficulties  for  common  items  as  weights  to 
determine  an  initial  transformation  line  based  on  mean  and  sigma  equating 
of  weighted  estimates  of  item  difficulties  for  the  common  items.  A  new  set 
of  weights  is  computed  using  a  combination  of  the  estimated  standard  error 
weights  and  robust  (Tukey)  weights  based  on  perpendicular  distances  to  the 
line.  A  new  transformation  line  is  computed  and  the  procedure  iterates  until 
the  maximum  change  in  the  perpendicular  distances  is  less  than  some  criterion. 

Method 

Computing  t_he_  Standard  Errors 

The  inverse  of  the  information  matrix  I  (p.  191  of  Lord  (1980))  is 
an  approximation  to  the  variance/covariance  matrix  for  the  item  parameter 
estimates.  The  diagonal  element  of  the  inverse  corresponding  to  the 
item  difficulty  is  the  estimated  variance  of  the  estimate  of  item  difficulty. 
The  sequare  root  of  this  quantity  is  the  estimated  standard  error  of  the 
estimate  of  item  difficulty. 

Each  item  has  two  estimated  item  difficulties,  one  from  each  calibration. 
Therefore,  each  item  has  two  estimated  standard  errors.  The  initial  weight 
for  an  item  to  be  used  in  the  iterative  procedure  is  the  reciprocal  of  the 
larger  estimated  squared  standard  error  of  the  estimated  item  difficulty. 
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The  accuracy  with  which  an  estimated  standard  error  of  b  is  computed 
is  the  ratio  of  the  determinant  to  the  product  of  the  diagonals  of  the 
information  matrix.  If  this  ratio  is  less  than  0.0001,  the  estimated 
standard  error  is  not  accurate.  The  item  is  given  a  standard  error  weight 
of  zero. 

All  people  are  included  in  the  computation  except  those  who  did  not 
reach  the  item. 


Computing  the  Mean  and  Sigma  Transformation 

We  have  two  distributions  of  weighted  estimated  item  difficulties, 

one  from  each  calibration.  We  let  h  be  the  distribution  from  the  first 

- 1  v 

calibration,  and  b^  be  the  distribution  from  the  second  calibration  and 
compute 


,  the  mean  of  , 

,  the  standard  deviation  of 
,  the  mean  of  b2  > 

,  the  standard  deviation  of 


*1 


The  mean  and  sigma  transformation  (line)  to  put  the  second  calibration 
estimated  item  difficulties  onto  the  scale  of  the  first  is 


b’  =  A  *  b2  +  B 


where  is  the  transformed  distribution  from  the  second  calibration. 

For  this  transformation. 


A  =  0  k  ^  k 
-bl  b2 


B  *  \  -  A  *  *b2 
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CompuCing  the  Tukey  Weights 

Page  20  of  Mosteller  and  Tukey  (1977)  gives  a  method  of  computing 
a  robust  estimate  of  location  by  weighting  data  with  differential  weights. 
We  use  only  one  piece  of  this  process,  namely  the  formula  for  the  weights. 

For  our  purposes,  Y*  is  the  transformation  line  we  have  tentatively 
found.  We  replace  Tukey's  (Y(i)  -  Y*)  with  the  perpendicular  distance 
of  a  point  to  the  line. 

Let  D(i)  equal  the  absolute  value  of  the  perpendicular  distance. 
Then  our  weights,  T(i)  ,  are 

({1  -  (D(i)/CS)2}2  when  (D(i)/CS)2  <  1 


T(i)  =  < 


0 


otherwise 


where  S  is  the  median  of  the  D(i)  and  C  is  a  constant  equal  to  6. 


The  Iterative  Procedure 

The  iterative  procedure  is  as  follows: 

Step  1:  For  each  item  difficulty,  for  each  common  item,  compute 

W(i)  =  SE(B(i))-2 

where  SE(B)  is  the  larger  of  the  two  estimated  standard  errors. 
Step  2:  Compute  a  vector  of  scaled  weights 


W(i)'  =  W(i)/(sum  of  W(i)) 
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Step  3:  Compute  the  mean  and  sigma  transformation  line  between  the  two 
sets  of  estimated  item  difficulties  weighted  by  W'  ,  and  get 
the  slope,  A  ,  and  the  intercept,  B  . 

Step  4:  Compute  the  perpendicular  distances  of  each  point  to  the  line. 
Step  5:  Compute  the  Tukey  weights,  T(i)  for  each  item,  using  these 
perpendicular  distances. 

Step  6:  Reweight  each  point  by  a  combined  weight  U(i)  ,  where 

U(i)  =  (W(i)  *  T(i) )/ (sum  of  W(i)  *  T(i)) 

Step  7:  Compute  the  weighted  mean  and  sigma  transformation  line  using 
these  new  weights. 

Step  8:  Repeat  Steps  4,  5,  and  6  until  the  maximum  change  in  the 
perpendicular  distances  is  less  than  0.01. 

Result 

This  procedure  gives  low  weights  to  poorly  determined  item  dif¬ 
ficulties  or  to  item  difficulties  which  are  outliers.  Once  the  final 
transformation  is  found  for  the  estimated  item  difficulties,  the 
estimated  item  discriminations  are  transformed,  as  well  as  the  ability 


estimates. 
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